Skip to content

Conversation

mcolomerc
Copy link

Linked issue: #144

JAVA API for Embedding models.

Ollama support fro Embedding models with ollama4j.

Purpose of change

Support for Ollama Embedding models in Java. (ollama4j)

Tests

OllamaEmbeddingModelConnectionTest

Example: WorkflowEmbeddingsAgentExampleJob provided.

API

Annotations: EmbeddingModelConnection and EmbeddingModelSetup annotations.

embedding.model: BaseEmbeddingModelConnection and BaseEmbeddingModelSetup

Documentation

Integrations: Added Ollama integration for using embedding models.

@xintongsong
Copy link
Contributor

Hi @mcolomerc, I noticed you've been updating this PR for the past week. First of all, thanks for your contribution. And I'm sorry for the late response and review on this. We've been busy with the 0.1.0 release and the Flink Forward conference. Just want you to know that I should be able to get back to the community and review this next week.

@mcolomerc
Copy link
Author

Hi @mcolomerc, I noticed you've been updating this PR for the past week. First of all, thanks for your contribution. And I'm sorry for the late response and review on this. We've been busy with the 0.1.0 release and the Flink Forward conference. Just want you to know that I should be able to get back to the community and review this next week.

yes, basically updated my fork and updated the poms.

Copy link
Collaborator

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mcolomerc, thanks for your contribution.

The overall architecture looks good to me, I left some minor comments.

* @param model The specific embedding model to use
* @return An array of floating-point values representing the text embeddings
*/
public float[] embed(String text, String model) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to define these embed methods in the parent class?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added these methods to the parent class, needed to add them as abstract methods to the base interface, forcing all embedding model implementations to support them. what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fine not to declare it as an abstract method, since most implementing classes may not need to modify the implementation of these methods.

…n property in integrations/pom.xml and reference it in embedding-models/ollama and chat-models/ollama. See: apache#250
Copy link
Collaborator

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for I forgot a comment in last review.

}

@Override
public float[] embed(String text) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add an additional parameter Map<String, Object> parameters here, like chat method of BaseChatModelConnection.

/**
 * Generate embeddings for a single text input.
 *
 * @param text The input text to generate embeddings for
 * @param parameters The additional arguments to configure the embed request.
 * @return An array of floating-point values representing the text embeddings
 */
public abstract float[] embed(String text, Map<String, Object> parameters);

Then we can remove the embed(String text, String model) and get model name from parameters.

I recommend this implementation because, in our design, different instances of EmbeddingModelSetup can be initialized with different configuration parameters, yet they can share the same EmbeddingModelConnection. When an EmbeddingModelSetup instance calls the embed method of EmbeddingModelConnection, it can pass its own unique configuration parameters, including model_name, encoding_format, and others.

You can take a look at python implementation as a reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I did a refactor to include the approach and now it's more aligned with BaseChatModelConnection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants